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ABSTRACT 

-  1 

Results  are  reported  from  a  research  project  on  three-dimensional  motion  analysis.  The  work  is 
based  on  stereo  image  sequences  of  a  military  vehicle.  The  analysis  depends  on  the  segmentation  of  the 
image  and  the  matching  of  features  between  the  two  images.  Preliminary  results  are  given  for  an  exten¬ 
sion  of  the  work  to  object  recognition.  Extracted  features  and  the  calculated  parameters  of  motion  are 
incorporated  into  a  symbolic  matching  system.  Object  recognition  is  performed  by  the  symbolic  inexact 
matching  of  a  representation  of  the  photographs  with  a  database  of  prototypes.  The  use  of  control  points 
to  furnish  ground  truth  is  discussed. 


1.  INTRODUCTION 

The  work  described  is  current  research  being 
performed  collaboratively  at  the  Center  for 
Artificial  Intelligence.  U.S.  Army  Engineer  Topo¬ 
graphic  Laboratories,  and  the  Coordinated  Science 
Laboratory.  University  of  Illinois  at  Urbana- 
Champaign.  Our  long-term  goal  is  to  develop  pro¬ 
totype  components  of  a  threat-detection  system 
for  a  mobile  robot.  The  research  has  centered  on 
determining  the  three-dimensional  motion  of  a 
vehicle  in  the  field  of  view  of  the  robot  and.  more 
recently,  on  attempting  to  recognize  the  vehicle. 

2.  APPROACH 

We  decided  to  work  with  sequences  of  stereo 
photographs  of  outdoor  scenes  because  they 
present  more  realism  and  challenge  than  table-top 
models.  Our  motivation  for  using  stereo  is  the 
instability  of  motion  estimation  algorithms  based 
on  monocular  image  sequences1.  We  included  pho- 
togrammetric  control  so  that  we  could  have  a 
form  of  ground  truth  to  help  evaluate  success. 
Motion  determination  was  first  priority:  parts  of 
the  motion  work  were  subsequently  incorporated 
into  object  recognition. 

3.  IMAGE  DATA  BASE 

We  first  created  a  database  of  18  stereo  pairs 
by  photographing  an  mll4  armored  personnel  car¬ 
rier  (APC)  in  transit  across  the  USAETL  parking 
lot.  Besides  the  APC.  our  pictures  include  trees, 
buildings,  parked  cars,  a  g«s  pump,  and  a 


basketball  hoop.  We  moved  the  APC  to  the  first 
position,  took  one  stereo  pair,  then  moved  it  to  the 
next  position  and  stopped  it  for  the  next  pair.  The 
image  sequence  gives  the  effect  of  a  vehicle  in 
motion,  but  the  motion  is  simulated.  Everything 
but  the  vehicle  is  stationary,  including  the  two 
cameras. 

The  imaging  setup  consisted  of  two  Pen  tax  6 
X  7  SLR  cameras  aligned  (to  the  best  of  our  abil¬ 
ity)  so  that:  the  film  planes  lay  in  the  same  plane, 
the  optical  axes  were  parallel,  and  the  plane  con¬ 
taining  them  was  parallel  to  the  ground.  The 
focal  length  of  the  lenses  was  105  mm.  the  base¬ 
line  was  3.6  m.  and  the  distance  from  each  optical 
axis  to  the  ground  was  1.22  m. 

The  distance  from  the  vehicle  to  the  cameras 
varies  from  20  to  60  m.  In  addition,  several  stereo 
image  pairs  were  taken  with  control  points 
inserted  in  the  scene.  Each  image  (55  mm  x  69.5 
mm)  was  digitized  on  an  Optronics  drum  scanner 
to  601  x  751  picture  elements  with  8  bits  per  pic¬ 
ture  element. 

We  subsequently  photographed  a  second 
sequence  of  20  stereo  pairs  with  the  same  imaging 
setup  but  with  two  vehicles  instead  of  one  in  some 
frames,  not  so  much  motion  between  frames  as  in 
the  first  sequence  and  a  moving  camera  system. 

We  label  our  images  Li.  Ri  which  denote, 
respectively,  the  left  and  the  right  image  of  the  ith 

stereo  pair  (i  -  1.  2 .  18  or  20).  The  time 

instants  at  which  the  images  were  taken  are 
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denoted  by  ti.  For  our  experiments,  sections  of 
various  sizes  were  windowed  out  of  the  images. 
Figure  1  shows  a  128  x  256  pixel  window  of  the 
left  image  in  the  ninth  stereo  pair  of  the  first 
sequence. 

Two  consecutive  stereo  pairs  at  a  time  are 
processed,  e.g.  LI.  Rl.  and  L2,  R2. 

4.  MOTION  DETECTION  AND  ESTIMATION 

We  want  to  solve  the  following  motion 
problem:  Given  two  stereo  image  pairs  taken  at 
time  instants  tl  and  t2  of  a  moving  rigid  object  in 
a  stationary  natural  surrounding,  determine  the 
3-D  motion  and  structure*  of  the  object.  Our 
approach  is  based  on  that  described  by  Huang2  in 
the  section  "Motion  from  3-D  Feature  Correspon¬ 
dences.*  This  method  is: 

Step  1.  Detect  and  segment  out  the  moving 
object  in  the  images,  thus  eliminating  the 
background. 

Step  2.  Extract  and  then  match  features  (points, 
lines,  circles)  from  the  t'vo  images  at  tl 
and  then  by  triangulation  determine  the 
3-D  positions  of  these  features.  Do  the 
same  for  the  two  images  at  t2. 

Step  3.  Match  the  two  sets  of  3-D  points  or  other 
features  at  tl  and  t2  to  find  3-D  point 
correspondences. 

Step  4.  Estimate  the  rotation  and  translation  of 
the  object  from  tl  to  t2  by  solving  the 
set  of  equations  involving  the  motion 
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Figure  1.  Left-9  image  of  ml  14  APC. 


*  "Structure"  means  the  3-D  coordinates  of  the  features 
that  are  extracted. 


parameters. 

We  have  an  advantage  over  the  monocular 
two-view  case  because,  as  Huang2  notes,  steps  2 
and  3  are  visually  easier  with  stereo.  For  Step  2 
this  is  because  of  the  epipolar  constraint;  given  a 
fixed  point  in  one  image  of  the  stereo  pair,  the 
corresponding  point  in  the  other  image  is  restricted 
to  lie  on  the  epipolar  line.  Step  3  has  a  rigid-body 
constraint;  the  distances  between  pairs  of  the  3-D 
points  on  a  rigid  body  are  invariant  under  motion. 

The  two  constraints,  epipolar  and  rigid  body, 
give  us  an  advantage,  but  there  are  two  major 
difficulties. 

The  first  difficulty  is  that  in  Step  2.  it  is  hard 
to  find  feature  extractors  that  yield  common 
features  in  a  stereo  pair.  Our  experience  with 
corner  detectors,  for  example,  is  that  the  corners 
found  in  the  right  image  very  infrequently 
correspond  with  the  corners  found  in  the  left 
image.  In  some  experiments  the  set  of  correspond¬ 
ing  comers  comprised  less  that  ten  percent  of  the 
total.  We  got  similar  results  using  the  Moravec3 
operator  to  detect  "interesting"  points.  We  were 
disappointed  in  both  Bums  lines4  and  Canny 
edges5.  Very  few  of  the  Bums  lines  were  common 
to  the  left  and  right  images  and  Canny  edges 
tended  to  contain  parts  of  several  objects.  For 
example,  an  edge  on  a  vehicle  often  merges  into  an 
edge  on  parked  cars,  trees,  or  other  background 
objects. 

The  second  difficulty  is  that  in  step  3.  good 
results  cannot  be  obtained.  The  problem  is  that 
range  information  on  the  features  is  usually  very 
inaccurate6  owing  to  image  sampling. 

To  alleviate  these  difficulties,  several  new 
algorithms  were  developed.  We  finally  achieved 
success  by  substituting  for  Steps  2  and  3: 

Step  2a.  Extract  features  in  LI.  and  then  find 
corresponding  features  in  Rl  by  some 
form  of  cross  correlation. 

Step  2b.  Similarly,  extract  and  match  features  in 
images  L2,  R2. 

Step  2c.  Similarly,  extract  and  match  features  in 
images  LI.  L2. 

Step  2d.  By  taking  the  intersection  of  the  three 
matched  sets  of  2a.  2b  and  2c.  obtain  4- 
way  feature  correspondences  among 
images  LI.  Rl,  L2.  R2. 


Step  3.  At  tl  and  t2,  respectively,  by  triangula¬ 
tion  determine  the  3-D  coordinates  of  the 
matched  points. 

Figure  2  shows  the  four-way  matching.  The 
best  matching  points  came  from  a  new  algorithm 
based  on  zero-crossing  patterns  of  second-order 
derivatives7.  Figure  3  shows  the  eleventh  left 
image  from  the  second  series  with  the  edges 
extracted  from  zero  crossings. 

We  are  also  encouraged  by  preliminary 
results  with  circle-finders*.  Feature  correspon¬ 
dences  on  circles  can  be  used  both  for  motion  esti¬ 
mation  and  object  recognition.  Since  the  number 
of  circular  structures  in  a  scene  is  normally  small, 
finding  correspondences  is  easier.  Wheels  usually 
present  themselves  as  ellipses  in  imagery. 
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Figure  2.  Four-way  point  matching. 
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Figure  3.  Edges  from  zero  crossings. 


McDonnell  et  al*  discuss  transforming  the  ellipses 
to  circles. 

In  summary,  after  overcoming  initial 
difficulties  our  project  has  developed  reliable  and 
robust  algorithms  for  feature  extraction4,  match¬ 
ing.1011  and  solution  of  the  equations  involving 
the  motion  parameters12.  Our  experiments  in 
using  control  points  for  ground  truth  have  been 
reported9.  We  found  our  "ground  truth*  was  good 
enough  to  show  that  our  results  are  at  least  quali¬ 
tatively  correct.  However  a  quantitative  evalua¬ 
tion  is  not  possible  because  our  imaging  setup  was 
not  calibrated  carefully  enough  and  the  cameras 
were  not  metric. 

5.  OBJECT  RECOGNITION 

Horaud  and  Skordas13  among  others  have 
observed  that  the  task  of  recognizing  an  object  in  a 
random  position  and  orientation  is  not  trivial.  An 
object  in  space  has  six  degrees  of  freedom  associ¬ 
ated  with  it:  three  rotation  parameters  and  three 
translation  parameters.  In  our  case  the  vehicle 
was  constrained  to  move  on  a  planar  horizontal 
surface  (the  parking  lot)  so  there  was  no  transla¬ 
tional  motion  up-and-down  and  the  rotation  is 
around  an  axis  orthogonal  to  the  ground  plane. 
Only  three  degrees  of  freedom  have  to  be  deter¬ 
mined.  one  rotation  parameter  and  two  translation 
parameters. 

Our  object  recognition  efforts  began  only 
after  we  had  developed  a  reliable  package  for 
estimating  motion  parameters.  We  therefore 
assume  we  have  the  parameters  and  use  them  to 
help  classify  the  vehicle.  The  motion  parameters 
are  the  basis  for  our  main  heuristic:  "The  parame¬ 
ters  of  motion  determine  the  orientation  of  the 
vehicle." 

We  use  orientation  as  a  means  to  index  into 
the  database  of  prototype  vehicles.  For  example  if 
the  motion  parameters  show  a  large  horizontal 
translation,  we  assume  we  are  looking  at  the  side 
of  a  vehicle,  and  we  compare  our  photograph  only 
with  side  views  of  prototypes. 

Our  database  is  composed  of  three  dimen¬ 
sional  Constructive  Solid  Geometry  (CSG)  models. 
A  viewing  angle  for  a  model  is  specified  as 
azimuth,  a  and  an  elevation.  Our  elevation  is  zero 
because  the  parking  lot  is  a  planar  surface.  The 
parameters  of  motion,  a  rotation  matrix  R  and  a 
translation  vector  T.  are  two  dimensional.  We 
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therefore  have 


cos  9 

—sin  6 

AX 

sin  9 

cos  6 

T  = 

Ay 

The  needed  viewing" angle,  a,  is  computed  as  the 
sum  of  the  two  angles  0  and  4>  ~  tan~r  Ay  /AX. 


Our  paradigm  for  object  recognition  is  the 
symbolic  inexact  matching  of  a  candidate  with 
several  prototypes14.  We  use  the  BRL-CAD15  solid 
modeling  system  to  generate  our  prototypes.  The 
package  came  with  sample  models  of  a  tank  and  a 
truck,  and  we  generated  a  model  of  the  APC  our¬ 
selves. 

Figures  4.  5  and  6  show  "wiresketches"  of 
prototypes  with  hidden  lines  removed. 


For  the  inexact  matching  itself,  our  intent 
was  to  use  the  work  of  Boyer.  Vayda  and  Kak16. 
which  is  a  substantial  extension  of  previous  work 
by  Shapiro  and  Haralick14.  We  found,  however, 
that  in  our  first  preliminary  experiments  we  were 


Figure  4.  Wiresketch  of  a  tank. 


Figure  5.  Wiresketch  of  an  armored 
personnel  carrier. 


Figure  6.  Wiresketch  of  a  truck. 


able  to  match  our  photograph  to  the  correct 
wiresketch  using  the  less  elaborate  method11.  Part 
of  the  reason  for  this  result  could  be  that  we  made 
some  simplifying  assumptions  (e.g.  no  noise 
features  in  the  photograph)  and  that  our  partly- 
manual  procedure  did  a  better-tban-average  job  of 
feature  extraction.  It  also  appears  that  our  pro¬ 
cedure  gains  strength  from  having  two  types  of 
primitives  to  match,  lines  and  circles,  instead  of 
only  lines.  We  also  have  more  constraints;  for 
example  some  of  our  lines  have  to  be  parallel. 
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